-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Iceberg remove_orphan_files #14383
Improve performance of Iceberg remove_orphan_files #14383
Conversation
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_GET_LENGTH), 6) | ||
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_NEW_STREAM), 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're still reading each manifest twice, I'll take a look at that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that first every manifest is read here:
for (ManifestFile manifest : snapshot.allManifests(table.io()))
and then again here:
ManifestFiles.read(manifest, table.io());
ManifestFiles.readDeleteManifest(manifest, table.io(), table.specs());
but this is still huge improvement as previously these were the numbers:
.addCopies(new FileOperation(SNAPSHOT, INPUT_FILE_GET_LENGTH), 4)
.addCopies(new FileOperation(SNAPSHOT, INPUT_FILE_NEW_STREAM), 4)
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_GET_LENGTH), 24)
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_NEW_STREAM), 24)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's part of how the ManifestReader works. The factory ManifestFiles.read
opens the file once to read the metadata, and then calling iterator
opens the file again. I don't think there's much we can do about that
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Outdated
Show resolved
Hide resolved
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Outdated
Show resolved
Hide resolved
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Show resolved
Hide resolved
@@ -384,10 +385,41 @@ public void testPredicateWithVarcharCastToDate() | |||
assertUpdate("DROP TABLE test_varchar_as_date_predicate"); | |||
} | |||
|
|||
@Test | |||
public void testRemoveOrphanFiles() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you planning on adding a test for snapshot expiration as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, probably in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...n/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetadataFileOperations.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
validMetadataFilePaths.addAll(metadataFileLocations(table, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else does it add beyond what we already collected ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This add an optional list of things from the metadata-log
,
A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit.
--
<br class="Apple-interchange-newline">
} | ||
|
||
validMetadataFilePaths.addAll(metadataFileLocations(table, false)); | ||
validMetadataFilePaths.add(versionHintLocation(table)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReachableFileUtil
could be a nice help here
in apache/iceberg#5795 it seems there is no single central place in Iceberg where ReachableFileUtil
is used; rather individual helper method calls are scattered across codebase.
wonder if there is some reuse opportunity still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think fixing it up I'd rather move this method into the Iceberg lib and have it defined once for Spark and Trino
validMetadataFilePaths.add(versionHintLocation(table)); | ||
|
||
scanAndDeleteInvalidFiles(table, session, schemaTableName, expireTimestamp, union(validDataFilePaths.build(), validDeleteFilePaths.build()), "data"); | ||
scanAndDeleteInvalidFiles(table, session, schemaTableName, expireTimestamp, validMetadataFilePaths, "metadata"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an error to have a data file in a metadata directory (or vice versa)?
is table corruption a fair punishment?
i guess this is no behavioral change here in this PR, so no change requested. But also, since you collect full paths, it seems easy to change if we want to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave it for now, but if we support more of the Iceberg write location properties / LocationProvider implementations we'll need to change this.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
@@ -5256,11 +5256,14 @@ public void testRemoveOrphanFiles() | |||
Session sessionWithShortRetentionUnlocked = prepareCleanUpSession(); | |||
assertUpdate("CREATE TABLE " + tableName + " (key varchar, value integer)"); | |||
assertUpdate("INSERT INTO " + tableName + " VALUES ('one', 1)", 1); | |||
assertUpdate("INSERT INTO " + tableName + " VALUES ('two', 2), ('three', 3)", 2); | |||
assertUpdate("DELETE FROM " + tableName + " WHERE key = 'two'", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change existing test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't exercise delete files, and also didn't validate that we didn't corrupt the table. Seemed like reasonable additions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let's have it as a separate commit, so that we can capture the rationale in the commit message.
otherwise it looks like a left-over from local testing
0016e30
to
538cfef
Compare
e1efed2
to
a9afb2b
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
a9afb2b
to
58e4b91
Compare
Is |
Yes. Looks like the change I added to use full paths doesn't handle trailing slashes well |
93f8632
to
dd57760
Compare
I switched back to using file names rather than full paths |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
dd57760
to
88dd805
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
4306403
to
7750214
Compare
Addressed, thanks @findepi |
} | ||
|
||
for (ManifestFile manifest : snapshot.allManifests(table.io())) { | ||
if (!processedManifestFilePaths.add(manifest.path())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to do
validMetadataFileNames.add(fileName(manifest.path())
do we not need to populate validMetadataFileNames
?
is it because metadataFileLocations(table, false).stream()
will do this for us?
if so, maybe we don't need validMetadataFileNames.add(fileName(snapshot.manifestListLocation()));
either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I get for thinking, I don't need to rerun the tests it was just a small refactor 😦
// Using file names may result in retaining files which could be removed. However, in practice Iceberg metadata | ||
// and data files have UUIDs in their names which makes this unlikely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for capturing this reasoning.
Can you also add a note why not using full paths in these sets?
7750214
to
7ce967d
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
7ce967d
to
3e50c1e
Compare
Set<String> processedManifestFilePaths = new HashSet<>(); | ||
// Using file names may result in retaining files which could be removed. However, in practice Iceberg metadata | ||
// and data files have UUIDs in their names which makes this unlikely. Collecting file names here rather than paths | ||
// makes this comparison easier when checking if files should be retained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this explanation.
I don't quiet get it yet. Intuitively, it's as easy to compare paths as it's to compare file names.
What problem did you face where you tried that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testCleaningUpWithTableWithSpecifiedLocation
was failing because /a/slash.txt
vs /a//slash.txt
resolve to the same file but have different paths.
I'll update the comment to address this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because
/a/slash.txt
vs/a//slash.txt
resolve to the same file but have different paths.
But only one is "correct".
I think i understand. The metadata may contain entries with double slashes. If we're running on a file system that normalizes repeated slashes (like .. not S3), the table works. To avoid breaking it here, we cannot compare paths.
3e50c1e
to
918ad53
Compare
Add testing for deletes and ensure the operation does not corrupt the table.
Avoid reading the Avro manifest files multiple times while building the set of live files.
918ad53
to
ee36f85
Compare
rebased |
Cassandra tests red. |
We pass impacted features to the product test launcher and it should be able to skip only some tests in a suite, depending on connectors (a kind of feature) used per environment. But impact analysis was disabled, you can tell by looking at the I'm not sure why, but that failed job doesn't match what's currently in this branch. There were changes in tests:
|
@nineinchnick Can it be these are reported as "impacted" because both these test modules depend on |
Yes, the list of impacted modules should only contain modules with actual changes in them and their dependants, not their dependencies. I also think I made a typo in #14541, using |
Description
Avoid reading manifests multiple times during the remove_orphan_files valid file set construction.
Fixes: #13691
Non-technical explanation
Reduce the file I/O required during remove_orphan_files
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: